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Abstract 

Background: Although the importance of proteins of the biomineral organic matrix and their posttranslational 
modifications for biomineralization is generally recognized, the number of published matrix proteomes is still small. 
This is mostly due to the lack of comprehensive sequence databases, usually derived from genomic sequencing 
projects. However, in-depth mass spectrometry-based proteomic analysis, which critically depends on high-quality 
sequence databases, is a very fast tool to identify candidates for functional biomineral matrix proteins and their 
posttranslational modifications. Identification of such candidate proteins is facilitated by at least approximate 
quantitation of the identified proteins, because the most abundant ones may also be the most interesting candidates 
for further functional analysis. 

Results: Re-quantification of previously identified Lottia shell matrix proteins using the intensity-based absolute 
quantification (iBAQ) method as implemented in the MaxQuant identification and quantitation software showed 
that only 57 of the 382 accepted identifications constituted 98% of the total identified matrix proteome. This 
group of proteins did not contain obvious intracellular proteins, such as cytoskeletal components or ribosomal 
proteins, invariably identified as minor components of high-throughput biomineral matrix proteomes. Fourteen of 
these major proteins were phosphorylated to a variable extent. All together we identified 52 phospho sites in 20 
of the 382 accepted proteins with high confidence. 

Conclusions: We show that iBAQ quantitation may be a useful tool to narrow down the group of functional 
biomineral matrix protein candidates for further research in cell biology, genetics or materials research. Knowledge 
of posttranslational modifications in these major proteins could be a valuable addition to previously published 
proteomes. This is true especially for phosphorylation, because this modification was already shown to modify 
mineralization processes in some instances. 



Introduction 

Phosphorylation is one of the most widespread post- 
translational modifications of proteins and also occurs in 
the organic matrix of biominerals [1,2]. Protein FAM20C 
has recently been identified as a kinase involved in phos- 
phorylation of such secreted proteins [3,4], but other 
kinases may also be involved [5,6]. In a few cases experi- 
mental evidence indicated an important function for 
phospho groups in biomineral matrix proteins. The 
best-examined matrix phosphoprotein in this respect is 
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mammalian osteopontin, first described as a major non- 
collagenous bone protein. Among the many functions 
suggested for this protein since its discovery (reviewed, for 
instance, in [7,8]) is also phosphorylation-dependent in- 
hibition of mineralization processes [9]. Removal of phos- 
pho groups by alkaline phosphatase significantly reduces 
its inhibitory potential in in vitro crystallization assays [10] 
and un-phosphorylated recombinant osteopontin, but 
not in vitro phosphorylated osteopontin, fails to inhibit 
mineralization of human smooth muscle cell cultures 
serving as a model for human vascular calcification 
[11]. A crucial role of phosphorylated residues in the 
interaction with mineral is also reported for dentin 
matrix protein 1 and dentin phosphophoryn [12,13]. 
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The only invertebrate example so far is orchestin, a 
major matrix protein from crustacean calcium storage 
structures. Phosphorylation of orchestin is necessary for 
calcium binding of the protein [14]. 

The recently published genomes of biomineralizing 
organisms enable high-throughput mass spectrometry- 
based analysis of biomineral proteomes and phospho- 
proteomes, thus facilitating the fast identification of 
phosphoproteins and phosphorylation sites [15,16]. In 
the present study we add the phosphoproteome of the 
Lottia gigantea shell matrix to the recently published 
Lottia shell proteomes [17,18]. Furthermore, we have 
re-quantitated the Lottia shell proteome using the iBAQ 
(intensity-based absolute quantification) method [19] as 
implemented in MaxQuant. This showed that 57 pro- 
teins make up 98% of the total identified proteome. We 
suggest that quantitation allows the identification of 
major proteins, which are the most likely candidates for 
functional shell proteins, while retaining information 
about minor proteins, irrespective of whether these minor 
proteins play a role in mineralization or not, and irrespect- 
ive of whether they occur intra- or extra-crystalline. 

Materials and methods 

Matrix and phosphopeptide preparation 

Lottia shell matrix was prepared as previously described 
[17] using method B for shell cleaning (2 h sodium 
hypochlorite incubation with 2x5 min ultrasound treat- 
ment). Reduction, carbamidomethylation and enzymatic 
cleavage of matrix proteins were performed using a 
modification of the FASP (Filter-aided sample prepar- 
ation) method [20] as outlined below. Two-mg aliquots 
of acid-soluble or acid-insoluble shell matrix were sus- 
pended in 300 |il of 0.1 M Tris, pH8, containing 6 M 
guanidine hydrochloride and 0.01 M dithiothreitol (DTT). 
This mixture was heated to 56°C for 60 min, cooled to 
room temperature, and centrifuged at 13000 rpm in an 
Eppendorf bench-top centrifuge 5415D for 15 min. The 
supernatant was loaded into an Amicon Ultra 0.5 ml 30 K 
filter device (Millipore; TuUagreen, Ireland). DTT was 
removed by centrifugation at 13000 rpm for 15 min and 
washing with 2 x Ivol of the same buffer. Carbami- 
domethylation was done in the device using 0.1 M Tris 
buffer, pH8, containing 6 M-guanidine hydrochloride and 
0.05 mM iodoacetamide and incubation for 45 min in the 
dark. Carbamidomethylated proteins were washed with 
0.05 M ammonium hydrogen carbonate buffer, pH8, con- 
taining 2 M urea, and centrifugation as before. Trypsin 
(20 |ig. Sequencing grade, modified; Promega, Madison, 
USA) was added in 40 i^l of 0.05 M ammonium hydrogen 
carbonate buffer containing 2 M urea and the devices 
were incubated at 37°C for 16 h. Peptides were collected 
by centrifugation and the filters were washed twice with 
40 \i\ of 0.05 M ammonium hydrogen carbonate buffer. 



The peptide solution was acidified to pH 1-2 with trifluor- 
oacetic acid (TFA) and peptides were vacuum-dried in an 
Eppendorf concentrator. 

Phosphopeptides were enriched by reversible binding 
to Ti02 beads (Titansphere 10 |im, GL Sciences, Japan) 
following estabUshed protocols [21] but substituting 
2,5-dihydroxybenzoic acid in the loading buffer by 6% 
trifluoroacetic acid (TFA) [22] . Briefly, beads were washed 
first in 80% acetonitrile containing 0.1% TFA (washing 
buffer), then in 80% acetonitrile containing 6% TFA (bind- 
ing buffer). Peptides were dissolved in binding buffer 
(200 i^l/peptides of 2 mg matrix) and added to approxi- 
mately 5 mg of loosely pelleted Ti02 beads. The mixture 
was incubated on a rotating wheel for 45 min. After 
centrifugation the supernatant was again incubated with 
fresh Ti02 beads as before. The beads were then washed 
twice with 200 \A of binding buffer followed by 2 x 
200 |il of washing buffer. Finally the loaded beads were 
filled into C8 Stage Tips and phosphopeptides were 
eluted with 2 x 100 [il of a solution containing 40% 
acetonitrile and 15% ammonia. The eluate was vacuum- 
dried in an Eppendorf concentrator to ~20 |^1 and acid- 
ified with TFA. The peptides were purified on C18 Stage 
Tips [23] after dilution to 200 pi with 0.5% acetic acid. 

LC-MS analysis 

Phosphopeptide-enriched samples were analysed on a Q 
Exactive high-performance Quadrupole Orbitrap mass 
spectrometer (Thermo Fisher Scientific, Bremen, Germany) 
[24] connected to an Easy-nLC 1000 nanoflow HPLC 
system (Thermo Fisher Scientific). Peptides were sepa- 
rated on a 50 cm column with an inner diameter of 75 
l^m filled with 1.8 ^m C18 beads (Reprosil-AQ Pur, 
Dr. Maisch GmbH, Ammerbuch, Germany) prepared as 
described [25]. Peptides were eluted with acetonitrile in 
0.1% formic acid using a gradient of 5-30% acetonitrile 
in 95min, 30-60% in 30 min and 60-95% in 8 min at a 
flow of 250 nl/min and a column temperature of 50°C 
[25]. Mass spectra were acquired in a data-dependent 
manner by automatically switching between MS and 
MS/MS in a top 10 approach. The resolution was 70000 
for full spectra and 17500 (both at m/z 200) for HCD- 
derived fragments. The dynamic exclusion time was 30 sec. 

Data analysis 

To estimate the percentage of each protein in the total 
identified shell proteome, raw-files used in a previous 
study [17; method B] were re-analysed using the iBAQ 
(intensity-based absolute quantification) method [19] as 
implemented in MaxQuant version 1.3.9.21. Carbamido- 
methylation was set as fixed modification, variable modi- 
fications were acetyl (protein N-term), oxidation (M), 
pyro-Glu (Q,E) and phospho (STY). Maximal FDR for 
peptide spectral match, proteins and site was set to 0.01. 
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The maximal peptide PEP was 0.01. Minimal peptide 
length was 7 amino acids. The minimal score for modi- 
fied peptides was 50 and the minimal delta score for 
modified peptides was 17. A minimum of two sequence- 
unique peptides was required for identification, except 
for proteins that were identified with two or more 
unique peptides previously in separately analysed acid- 
soluble and acid-insoluble fractions [17]. In very few 
cases new proteins were accepted with one unique pep- 
tide if this peptide occurred several times in different 
fractions and with an abundance of >0.01. The second 
peptide option was activated to enable identification of 
co-eluting peptides with very similar mass [26]. Two 
miss -cleavages were allowed. The databases used were 
Lottia FilteredModels (Lotgil_GeneModels_FilteredMo- 
delsl_aa.fasta.gz) and Lottia AllModels (Lotgil GeneMo- 
dels_AllModels_20070424_aa.fasta.gz) [27] downloaded from 
(http://jgi.doe.gov/), and a LOTGI subset of UniProtKB 
v2013_7 entries downloaded from http://www.uniprot.org/. 
These were supplemented with the reversed sequences 
and common contaminants automatically and used for 
quality control and FDR setting by MaxQuant. Phospho- 
peptides were accepted if they occurred at least twice or 
were confirmed by analysis of phosphopeptide-enriched 
samples. 

Peptide mixtures for enrichment of phosphopeptides 
were prepared from three biological replicates prepared 
according to method B of [17]. The acid-soluble and the 
acid-insoluble matrix of each biological replicate were 
used to prepare five technical replicates, resulting in 30 
raw files that were evaluated together using MaxQuant 
[26,28] version 1.3.9.21 with the same settings as above 
with a minimum of one sequence-unique phosphopep- 
tide only, but sequenced at least twice and in different 
replicates. The decoy mode was set to reward in Max- 
Quant. Phosphopeptide spectra were validated using the 
MaxQuant Expert system, which provides additional 
fragment annotations not included in the routine anno- 
tation [29]. Criteria were the assignment of major peaks, 
occurrence of uninterrupted y- or b-ion series of at 
least four consecutive amino acids, preferred cleavages 
N-terminal to proline bonds, the possible presence of 
a2/b2 ion pairs, the presence of immonium ions, and 
mass accuracy. In general only phosphopeptide identi- 
fications with a localization probability of >0.75 were 
accepted. However, in some cases adjacent residues, 
such as X(n)-S-S-X(n), could not be resolved with the 
fragmentation pattern of the respective phosphopeptides, 
making it impossible to exactly localize the phosphory- 
lation site. As a result, lower localization probability scores 
were attributed to several residues. Such phosphopep- 
tides were also accepted. Phospho sites were searched 
for known kinase motifs using Phosida Motif Matcher 
(http://www.phosida.com/) [30,31] and PhosphoMotif Finder 



(http://www.hprd.org/PhosphoMotif_finder) [32]. Most 
sequence-unique peptides were identified several times 
and site occupancy of phospho sites was estimated by 
comparing the number of unmodified to the number of 
phosphorylated forms of individual peptides. 

Sequence similarity searches were performed with 
FASTA (http://www.ebi.ac.uk/Tools/sss/fasta/) [33] against 
current releases of the Uniprot Knowledgebase (Uni- 
ProtKB). Other bioinformatics tools used were Clustal 
Omega for sequence alignments (http://www.ebi.ac.uk/ 
Tools/msa/clustalo/) [34], InterPro (http://www.ebi.ac.uk/ 
interpro) [35] for domain predictions, and SignalP 4.1 
(http://www.cbs.dtu.dk/services/SignalP/) [36] for signal 
sequence prediction. Amino acid composition and theo- 
retical pi were determined using the ProtParam tool 
provided by the Expasy server (http://web.expasy.org/ 
protparam/) [37]. Intrinsically disordered protein struc- 
ture was predicted using lUPred (http://iupred.enzim. 
hu/) [38] and methods provided by the PredictProtein 
2013 server (https://www.predictprotein.org/) [39,40]. 
GO categories for subcellular location were derived 
from UniProt and Lottia database entries, signal se- 
quence predictions and similarity to known proteins. 

Results and discussion 

Re-analysis and re-quantitation of Lottia shell proteins 
with MaxQuant-implemented IBAQ 

In search of the reasons for apparent differences in previ- 
ously published Lottia shell proteomes [17,18] we noticed 
that database searches were done using the AllModels 
database in [18] while [17] used the FilteredModels data- 
base containing entries supported by EST sequences. 
Therefore we re-analyzed the raw-files produced previ- 
ously for acid-soluble and acid-insoluble matrix prepared 
according to method B [17] (also used to identify phos- 
phoproteins in the present report) using a combination 
of both databases and a subset of Uniprot containing 
Lottia + gigantea entries. Furthermore, to determine the 
approximate abundances of the identified proteins, the 
iBAQ (intensity-based absolute quantification) method 
[19] as implemented in more recent MaxQuant versions 
was enabled in this search. The previously used [17] 
emPAI method [41] belongs to the spectral count methods 
based on counting the number of identified unique parent 
ions per protein. In contrast, iBAQ and similar algorithms 
are called intensity-based because they calculate the sum 
of parent ion intensities of identified peptides per protein. 
In both types of methods, the numbers of theoretically 
possible peptides per protein for the protease used in sam- 
ple preparation enter the equation to account for different 
protein lengths and distribution and frequency of cleavage 
sites. Comparison of the two different types of methods 
show a higher accuracy of the intensity-based methods, 
including iBAQ (for instance [42]), indicating that they 
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should be given preference. Furthermore, the emPAI 
method in its original form [41] as we used it has become 
somewhat obsolete because of the recent progress in tech- 
nology. For instance, modern mass spectrometers and the 
associated software provide high-confidence identifica- 
tions of much longer peptides than previously possible. 
Consequently these long peptides are not included into 
emPAI calculations [41], but are included in iBAQ 
calculation. 

Irrespective of the quantitation method accurate quan- 
titation certainly also depends on the quality and com- 
pleteness of the available sequence databases. Sequences 
not contained in the database can be neither identified 
by high-throughput mass spectrometry-based proteomic 
analysis nor quantitated. The same applies to sequences 
having no cleavage sites for the protease used in sample 
preparation. Faulty combination of sequences belonging to 
different proteins into one database entry or unnoticed 
faulty allocation of fragments of one protein to different 
database entries can all bias quantitation results. Finally, 
the abundance of proteins bearing many posttranslational 
modifications will be underestimated if the modification is 
not included in the analysis. In spite of these caveats we 
believe that routine quantitation of proteins in in-depth 
proteomic studies may be a useful tool to identify possible 
functionally important proteins for further study. We 
express the abundances as percentage of the identified 
proteome, obtained by normalizing the iBAQ intensities 
to the sum of all intensities. While the decision what to 
count as a major protein or a minor protein still remains 
arbitrary, it may now be more comprehensible to the 
reader and will possibly facilitate the decision of which 
proteins to choose for further studies. 

The results of this new search (Additional file 1: 
Table SI) now includes all proteins published by [18] 
and contains 496 proteins/protein groups. Of these, 
382 protein/protein group identifications were accepted 
(Additional file 2: Table S2) according to the rules stated 
in the Materials and Methods section. Twenty-three pro- 
teins were identified in the AUModels database only or in 
combination with the UniProt entries, including several 
very abundant ones (Table 1). Many groups contained 
several AllModels entries testifying to the high redun- 
dancy in this database. The corresponding MaxQuant 
table with protein data is contained in Additional file 1 
(Additional file 1: Table SI), which also includes identi- 
fications not accepted. These were, for instance, identi- 
fications with only one single peptide with low scores or 
insufficient sequence coverage. The peptide data of the 
more than 4000 sequence-unique peptides, including 
peptide sequences and scores, are shown in Additional 
file 3 (Additional file 3: Table S3). 

Quantitation with iBAQ showed that only 18 proteins/ 
protein groups of a percentage of more than 1% of the 



identified proteome already constituted approximately 
82% of the entire identified proteome (Table 1). This 
group comprised two very abundant (>1%) proteins not 
contained in the FilteredModels database, the Asp-, Gly-, 
Lys- and Ser-rich peroxidase-like protein-1 (DGLSP_ 
LOTGI/Lotgil 1 162078) and the Gly- and Ser-rich protein- 
1 (GSPl_LOTGI/Lotgil 1239214) [18]. If a percentage of 
larger than 0.1% was chosen as a threshold, a total of 57 
proteins (Table 1) amounted to approximately 98% of the 
total identified proteome. These included CCD2 (coiled- 
coil domain-containing protein 2; Lotgil| 234936), the 
perlwapin-like protein PWAP_LOTGI/Lotgil| 239121, and 
the EGF-like domain-containing protein 2 (ELDP2/Lotgil | 
167423) [15], which were contained in the AllModels 
database but not in the FilteredModels database. Almost 
all proteins also identified in [18] were contained in this 
fraction of the proteome. Exceptions were the EF-hand 
calcium-binding domain-containing protein 1 and 2 
(EFCB1/B3A0Q5, EFCB2/B3A0R9), and Threonine-rich 
protein LUSP-15/TRP/B3A0R4, which apparently belonged 
to the minor components of the identified proteome 
(Additional file 2: Table S2). However, we also identified 
several entries with a high similarity to EFCB2 based on 
sequence overlaps with sequence identities of 43-90% 
(Figure 1). Taken together, this protein family constituted 
slightiy more than 0.1% of the identified proteome. 

In agreement with a previous study [18] the major pro- 
teins comprised three peroxidase-like proteins (Table 1) 
including the most abundant protein Lotgi| 162078/ 
DGLSP_LOTGI. Peroxidases are a large and widespread 
family of enzymes catalysing redox reactions using a 
variety of electron donors and acceptors, including 
organic molecules. Peroxidases have been implicated 
previously in mollusc shell formation [43]. Possibly they 
are responsible for the sclerotization of the periostra- 
cum [44-46], a proteinaceous layer confining the mantle 
cavity before the start of mineralization. As discussed 
previously [18] one may hypothesize that peroxidases 
function in stabilization of the newly secreted matrix by 
cross-linking some of its components. Another major 
protein, the abundance of which was noticed only using 
the AllModels database because the FilteredModels only 
contained a small fragment, was Lotgil| 166131. In this 
protein a long stretch of sequence with predicted disor- 
dered structure is followed by a predicted superoxide 
dismutase domain. Superoxide dismutases are a family 
of enzymes with widespread subcellular distribution that 
remove superoxide, a normal aerobic metabolite. One 
reaction product of superoxide dismutases is H2O2, a 
substrate of peroxidases. 

In general, very little is known about the possible func- 
tions of shell matrix proteins, but in some cases similar- 
ities to known proteins and predicted domain structures 
may provide some clues for further studies. Predicted 



Table 1 Fifty-seven proteins with an individual percentage of equal to or larger than 0.1% constitute 98% of the total identified proteome 



Protein 



Accession-no. 



% of total identified Phospho- 
proteome rylation 



Aspartate-, glycine-, lysine- and serine-rich protein/B3A0P1/peroxidase-like protein 1; domain: 
haem_peroxidase (~aa40-675); 20% G, 12% S; pi 4.96; GO: extracellular; DS: most of aa680-1870 

Proline-ricli protein 1/B3A0Q1; 11% A, 13% P; pl:9.72; GO: extracellular; DS: C-terminal 15aa 

Glycine- and methionine-ricli protein/B3A0R1; 12% A, 20% G, 10% L, 18% M, pl:1 1.24; GO: extracellular; DS: aa125-225 

Glycine- and Serine rich protein-1/B3A0P6; 10% A, 20% G, 13% S; pi 9.0; GO: extracellular; DS: ~aa67-84 (ISaa) 

Peroxidase-like protein 2/B3A0P3; domains: haem_peroxi-dase (~aa666-1 124); 13% G, 11% S; pi 8.52; GO: extracellular; 
DS: ~aal-620, aal 197-1492 



Glycine-rich protein/B3A0R2; 10% A, 16% C 

Uncharacterized shell protein 5/B3A0Q0; 
in C-terminal half 



12% M, 10% L; pi 9.87; GO: extracellular; DS: aa127-145 (19aa) 

% A, 11% R, 11% L; pi 10.32; GO: extracellular; DS: short stretches especially 



Coiled-coil domain-containing protein 1/B3A0Q3; domain: coil; 31% D; pi 3.55; GO: extracellular; DS: short stretches all 
over aa27-394 

Similar to blue mussel shell protein (BMSP)/similar to collagen a4 (VI); domains: VWA; 1 ]% I; pi 8.33; GO: extracellular; 
DS: none 

Uncharacterized shell protein 13/B3A0R3; 10% G; pi 8.32; GO: extracellular; DS: ~aa180-291 
Uncharacterized shell protein 16/B3A0R5; pi 9.63; GO: extracellular; DS: none 

Proline-rich protein 2/B3A0R8; 16% P; pi 9.98; GO: extracellular; DS: short stretches especially in aa161-186 

Glycine-, glutamate-and proline-rich protein/B3A0P5; domain: Lysozyme_like (~aa240-415); 12% Gly; pi 5.08; GO: 
extracellular; DS: aa73-137, aa201-218 

Methionine-rich protein/B3A0R7; 10% N, 11%> P; pi 9.62; GO: extracellular; DS: ~aa50-400 

Uncharacterized shell protein 26/B3A0P4/BMSP-like; 18% G, 12% S, 10% T; pi 9.1 1; GO: extracellular; DS: small segments 
scattered over entire sequence 



Uncharacterized shell protein 8/B3A0Q4; 
throughout the sequence 



11% P, 10% Y; pi 9.71; GO: extracellular; DS: short regions interspersed 



Uncharacterized protein; 



) 0 (C-term), 1 ]% P; pi 9.67; GO: none; DS: : small segments scattered over entire sequence 



Uncharacterized/similar to superoxide dismutase; domain: SOD; ]2% P, 10%) 0; pi 9.30; GO: intracellular/extracellular; 
DS: ~aa20-450; SOD:~aa480-635 

SCP domain-containing protein 2/B3A0P8; domain: CAP (~aal45-310); pi 9.56; GO: extracellular; DS: ~aa16-155 

Similar to nacrein-lil<e protein/putative carbonic anhydrase 1/B3A0P2; domain: a-carbonic anhydrase; pi 6.44; 
GO: extracellular; DS: none 

Putative carbonic anhydrase 2; aal90-632 100% identity to CAH2/B3A0Q6; domain: a-CA (~aa85-41 1); 11% R, 13% 

D, 13% G; pi 5.87; GO: extracellular; DS: aa41 5-633 

Uncharacterized protein; 10%) A, 12%) L; pi 9.77; GO: extracellular; DS: few to none 

Uncharacterized protein; domain: CBM_14 (chitin-binding)/peritrophin A (~aal8-87); pi 6.65; GO: extracellular; DS: none 
Uncharacterized protein; domain: IGFBP_Nterm; 11% C, 10% S; pi 9.03; GO: extracellular; DS: none 



Lotgil 1162078 DGLSP_LOTGI^ 

Lotgil 1235497^ PRP1_L0TGP 
Lotgil [239174' GMP_LOTGI^ 
Lotgil 1239214 GSP1_L0TGI^ 
Lotgil 1232817^ PLSP2_L0TGI^ 

Lotgil |2391 70' GRP_LOTGI^ 
Lotgil 1238831 ' USP5_L0TGI^ 

Lotgil 1233420' CCD1_L0TGI^ 

Lotgil 1 140660' Lotgil p 731 39^ 

Lotgil 1234885' USP13_L0TGI^ 
Lotgil 1231046' USP16_L0TGI^ 
Lotgil 1230510' PRP2_L0TGI^ 
Lotgil 1231311' GEPRP_LOTGI^ 

Lotgil 1173200' MRP_LOTGI^ 
Lotgil 1238526' USP26_LOTGI^ 

Lotgil 1228268' USP8_L0TGI^ 

Lotgi11158113' 
Lotgi1|166131 Lotgil |1 01 61 1 ' 

Lotgil 1233200' SCP2_L0TGI^ 
Lotgil 1238082' CAH1_L0TGI^ 

Lotgil 1239188' CAH2_L0TGI^ 

Lotgil 1231009' 
Lotgil 1173138''^ 
Lotgil 1174065' 



16.71 

12.28 
9.14 
6.82 



5.91 
5.11 



2.81 

2.13 
2.01 
1.67 
1.45 

1.43 
1.42 

1.22 

1.19 
1.09 

0.97 
0.96 

0.88 

0.87 
0.87 
0.81 



(+) 

{+) 
++ 



Table 1 Fifty-seven proteins with an individual percentage of equal to or larger than 0.1% constitute 98% of the total identified proteome (Continued) 



Uncharacterized shell protein 4/B3A0P9; 10% S, 12% Y; pi 8.89; Go: extracellular; DS: possibly short C-tern segment 

Glycine and tyros! ne-rich protein/B3A0Q2; 14% G, 13% T; pi 5.43; GO: extracellular; DS: most of the sequence 

aal51-448 96% identity to coiled-coil domain-containing protein 2/B3A0Q7; 10% D, 20% G (GM/GGG-rich C-terminus 
(~aa430-630); pi 3.77; GO: extracellular; DS: most of aa290-410 

Uncharacterized protein; domains: antistasin, WAP; 16%C,1 1% P, pi 5.62; GO: extracellular; DS: none 

Uncharacterized protein/glycosidase 2; domain: DUF187; similar to GEPRP_LOTIA (37% identity); pi 4.76; GO: 
extracellular; DS: ~aa78-130 

Uncharacterized protein/similar to ER aminopeptidase; domain: peptidase_Ml, ERAP1_LIKE_C; pi 8.94; GO: ER/Golgi/ext. 
plasma membrane; DS: none 

SCP domain-containing protein 1/B3A0P7; domain: CAP (aal43-305); 11% S; pi 9.21; GO: extracellular; DS: ~aa20-l 10 

Uncharacterized Gly-rich protein; 12% N, 22% G; pi 9.54/9.30; GO: extracellular; DS: ~aa40-200 (275200) 

Similar to chorionic proteinase inhibitor/perlwapin-like; domains: WAP (5x); aal-125 99.6% identity to B3A0S1; 
11% C, 10% P; pi 7.84; GO: extracellular; DS: none 

Uncharacterized protein; pi 9.49; GO: none; DS: none 

Proline-rich protein 3/B3A0S4; 10% N, 11% G, 13% P; pi 9.56; GO: extracellular; DS: few short segments 

EGF-like domain-containing protein 1 (aal70-682 of entry)/B3A0R6; domains: EGF (aa241-277), zona _pellucida (ZP; 
aa284-534); pi 5.80; GO: extracellular; DS: ~aa525-620 

Peroxidase-3/B3A0Q8; domain: haem_peroxidase (aa531-1077); 13% N; pi 7.5; GO: extracellular; DS: 26-381 

Uncharacterized protein/LUSP_10; 16% A, 17% D; pi 3.82; GO: extracellular; DS: most of the sequence 

Uncharacterized protein; Pro/Ala- and His-rich motifs in C-term; pi 8.78; GO: extracellular; DS: short segments scattered 
over entire sequence 

Similar to peptidyl-prolyl cis/trans isomerase/BSAORO; domain: cyclophilin_type_PPI; 13% G; pi 4.75; GO: extracellular; 
DS: none 

Uncharacterized; domains: VWC/pacifastin; 13% C, 12% D, 11% S; pi 3.87; GO: extracellular; DS: none 
Uncharacterized Gin-rich protein; 26% 0, 13% L, 12%T; pi 4.02; GO: extracellular; DS: ~aa40-320 
Uncharacterized Pro-rich protein; 15% P; pi 9.50: GO: extracellular; DS: aa32-416 
Uncharacterized protein/LUSP-18; 15% P, 15% T; pi 5.73; GO: extracellular; DS: ~aal8-557 

EGF-like domain_containing protein 2/B3A0S3; domains: EGF (aa73-109), ZP (aan6-370); pi 4.9; GO: extracellular; 
DS: few (aa364-386,403-425) 

Uncharacterized protein/Similar to PIE; 41% identity to PIF_PINFU aa427-526; domain: ConAJikeJectin; pi 8.91; 
GO: extracellular; DS: none 

Uncharacterized protein/LUSP-14; domain: chitin_binding_3; pi 8.77; GO: extracellular; DS: aa225-251 

Uncharacterized protein; 28% identical to PIF_PINFU: domains: VWA, chitin-binding, ConAJikeJectin; pi 5.15; 
Go: extracellular; DS: none 



Lotgil 1236183' USP4_L0TGI^ 
Lotgil 1235621' GTRP_LOTGI^ 
Lotgil 1234936 CCD2_L0TGI^ 

Lotgil 1239125' Lotgil |226725 
Lotgil 1174920''^ 

Lotgil 1 140786' Lotgil |225855 

Lotgil|233199' SCP1_L0TGI^ 

Lotgil 1239447' Lotgil jl 75200 

Lotgil|239121 Lotgil [201 802 
PWAPL_LOTGI^ 

Lotgil 1234387' 

Lotgil 1237996' Lotgil p 721 16 
PRP3_L0TGI^ 

Lotgil 1235548' ELDP1_L0TIA^ 

Lotgil 123281 8' Lotgill99809 
PLSP3_L0TGI^ 

Lotgil 1 163637''^ 

Lotgil 1233397' Lotgil p 63339 

Lotgil 1222979' Lotgil p 69679 

ppi_lotgP 

Lotgil 1230854' Lotgil 199757 
Lotgil 11 59331' 
Lotgil 1174003' 
Lotgil 1235610''^ 

Lotgil 1167423 ELDP2_L0TGP 

Lotgill237510' Lotgil p 71 086 

Lotgil 1226726' Lotgil 12391 29^ 
Lotgil 1228264' 



0.77 
0.71 
0.67 

0.66 
0.64 

0.61 

0.53 
047 
0.39 

0.38 
0.34 

0.27 

0.26 

0.25 
0.24 

0.24 

0.23 
0.22 
0.22 
0.20 
0.19 

0.16 

0.16 
0.15 



Lotgil 1234884' Lotgil 1166202 



0.14 



Table 1 Fifty-seven proteins with an individual percentage of equal to or larger than 0.1% constitute 98% of the total identified proteome (Continued) 



Uncharacterized Gin-rich protein; domain: Sushi/SCR/CCP (aal58-212); 19% Q, 1 1% P; pi 9.19; GO: extracellular; DS: most 
of the sequence 



Uncharacterized protein; aal-138 100% identity to ASRP/B3A0S2; 10% A, 10% N, 19% D, 11% V; pi 3.73 acid C-term half); 
GO: extracellular; DS: aa43-232 


Lotgi 


1 1238358 


ASRP_ 


LOTGI^ 


0. 


4 


+ 


Uncharacterized protein; 13% S; pi 4.43; GO: extracellular; DS: aa47-338 




Lotgil] 


71084' 




0. 


1 


+ 


Perlustrin-like/B3A0Q9; 43% identity to PLS_HALLA; domain: IGFBP_N; 11% C, 1 1% E; pi 4.05; GO: extracellular; DS: none 


Lotgi 


1238970^ 


PLSLP. 


_lotgP 


0. 


1 




Uncharacterized protein; 10% Q, 10% P, 11% S; pi 9.79; GO: extracellular; DS:~aa90-928 




Lotgil 


58316' 




0. 


0 




Uncharacterized protein; domain: SOL)L_haem_binding; 13% L; pi 6.96; GO: extracellular; DS: none 


Lotgil 1205030^ 


Lotgil 


237594 


0. 


0 




Uncharacterized protein; 1 1%) E, pi 4.32; GO: none (transmembrane?); DS: aa426-669 and smaller segments 




Lotgil 


54020' 




0. 


0 


++ 


Uncharacterized shell protein 22/B3A0S0; 21% Q, 18% P; pi 843; GO: extracellular; DS: most of the sequence 


Lotgil 


1236690^ 


USP22 


_LOTGI^ 


0. 


0 




Uncharacterized protein/LUSP-20; domains: chitin_binding CBM_14/peritrophin A (aa384-504); 13% T; pi 6.79; 
GO: extracellular; DS: most of ~ aa60-380 




Lotgil 1239574^'^ 


0. 


0 





+, less than three peptides phosphorylated; ++, three or more phosphopeptides; (+}, not confirmed with phosphopeptide-enriched samples. DS, predicted disordered structure. \ previously identified by Mann et al., 
2012 [17]; ^ previously identified by Marie et al., 2013 [18]. 
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Lotgil 


157683 


1 


Lotgil 


230732 


1 


Lotgil 


230731 


1 


Lotgil 


157689 


1 


EFCB2/B3A0R9 


1 


Lotgil 


231426 


42 


Lotgil 


157690 


47 


Lotgil 


239519 


1 


Lotgil 


157683 


39 


Lotgil 


230732 


39 


Lotgil 


230731 


41 


Lotgil 


157689 


33 


EFCB2/B3A0R9 


42 


Lotgil 


231426 


83 


Lotgil 


157690 


81 


Lotgil 


239519 


130 


Lotgil 


157683 


79 


Lotgil 


230732 


77 


Lotgil 


230731 


91 


Lotgil 


157689 


69 


EFCB2/B3A0R9 


78 


Lotgil 


231426 


119 


Lotgil 


157690 


114 


Lotgil 


239519 


167 



Figure 1 Alignment of 

sequence of Lotgil |2395 
shown entries were part 



M KL AL VL - VA WL WN VE GWGWRAPRIR--IPPIRIPRIPI 

MKLAL VL - VAWL WN AE GWGWRAPRVS - -WPRIRIPRIGI 

MKLALIL-VAWLVVN VE GWRLTRRTSRFTIPRFPIPRFPT 

MKL AL VL - VAWL WN VE GFWWRRRRIR--IPPFR 

MKVAWLI - - WLWMMIGQETDS WRIRIRRGRKIFRKIRPYI 

MKVAWLI--WLVVMMIGQETDS WRIRFRRGRRLLRRIAPFV 

MKITLLLL--W - WMMMGLE VHLAYNP YRVDVRI RR 

MKIGLILLVAVIT--MCQEAE///////////////////////////// 

PRI PLPRLPRI PI PRI PWGKRDVQQA AAAEDGVLSDDELK 

P PVT IPG IRIT RD VRE AEGDAAFNAAAEDGVLSDDE I K 

PCYPISRFPKPRKPSIPRMPWGKRNVREAEGDAGFKAAAEDGVLSNDEIK 

IQLRMPCGKKDVRQADNDAAFKAAAEDGVLSDDEIK 

PFVIGA VGKRQ AGDAE FQAKYNAAAEDGVFTDE E I K 

PIVIRA FGKRQAGDAE FQAKYNAAAEDGVFTDEE I K 

GWLWGKRDVRNADFDAAYNAAAEDGVFTDDE I K 

1 1 1 1 1 1 1 1 1 1 / //WWLRKRWSGKKDVRDADFDAAYNAAAKDGVFTDEEIK 

SILGVADEGLAEVYEVYDVNEDGVITVAEFEAVSSILENMQGEEEGQ-- 

SVLGVADKDLAGFKVLYDVNSDGKITVEEYRAVTATLAN-AGDKEN 

SVFGVKDEDLADFYDLYDVNGDGKITVEEYQSVTTILAN-AGDKEN 

S VLGVADEDLAD F YDLYD VNGDE KI TVEE YE S VTTVLAN - AGDKEN 

SVFGVDDNGFVEFKATYDVDGDGWQVEEYETWELTENLAG 

SVFGVDDNGLVEFKATYDVDGDGWQVEEYETWELTENLAG 

S VFGVDVD EFKAAYDVNDDGWKVLE YELVNKVNQDE 

SVFGVDENGFAEFKENFDVNEDGWEVEEYETLASNENKVNETKEKRWK 

EFCB2 to similar sequences. Sequences covered by MS/MS-sequenced peptides are shown in red. Slashes in the 
19 indicate an insert between signal peptide and the EFCB2-like sequence that does not occur in the other entries. All 
of protein groups containing other similar sequences due to the high redundancy of the AIIModels database. 



domain structures, GO terms for subcellular location, 
unusual amino acid composition features (amino acids 
representing > 10% of the sequence) and theoretical 
isoelectric point for major identified Lotgi entries are 
included in Table 1. Extremely acidic matrix proteins 
(pi below 4.5) have found much interest in biomineral- 
ization research because of the possibility of direct 
interaction with the positively charged biomineral cations 
and have been hypothesized to act as nucleation sites 
involved in crystal formation [47]. The group of 57 pro- 
teins with an abundance of >0.1 includes eight of such 
uncharacterized unusually acid proteins (Table 1) that 
may deserve to be studied in more detail. Many proteins 
isolated from biominerals contain sequence regions of 
intrinsically disordered structure, a feature that is impli- 
cated in protein-protein interaction and mineral binding 
[48,49]. Table 1 includes several proteins with extended 
sequence regions of predicted disordered structure, such 
as the peroxidase-like protein-1 (DGLSP_LOTGI), the 
methionine-rich protein MRP LOTGI, peroxidasejike 3 
(PLSP3_LOTGI), and the uncharacterized proteins in 
Lotgil 1 163637, 159331, 235610, 234884, 171084, 158316, 



236690, and 239574. In two sequences both features, 
unusual acidity and predicted long-range structural dis- 
order, coincide (Lotgi| 159331, 171084). However, like all 
predicted features, predicted structural disorder needs 
experimental validation before far-reaching conclusions 
can be drawn. 

Sometimes predicted domains strongly indicate in- 
volvement of the respective protein in biomineraliza- 
tion events. The putative carbonic anhydrases encoded 
in Lotgil 238082/CAHl and Lotgi|239188/CAH2 and 
discussed previously [18] may be important for carbonate 
ion delivery. Also of special interest are proteins containing 
chitin-binding domains, such as Lotgil 1 226726, 228264, 
and 239574. Many mollusc shells contain chitin-based 
extra-crystalline scaffolds and chitin-binding proteins 
may be important for organizing such scaffolds or may 
mediate interactions between chitin and the calcified 
matrix [50]. However, for most proven and putative shell 
matrix proteins the function remains unknown at present. 

Most of the identified proteins were only minor, or 
trace, components that may not have a function in bio- 
mineralization. However, it should be emphasised that 
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there may be exceptions. For example, protein FAM20C 
(0.006% of the Lottia shell proteome; Additional file 2: 
Table S2), was recently identified as a Golgi apparatus 
kinase responsible for the phosphorylation of many 
secreted proteins, including proteins important for bio- 
mineralization [3,4]. This kinase is also secreted to some 
degree, may be active in the extracellular space [5], and 
may enter biominerals in the company of its substrates. 
Of course this does not imply any function within the 
matrix but may explain its presence there. Other exam- 
ples of the possible importance of trace components for 
biomineral formation are the sea urchin spicule proteins 
P58-A and P58-B. The extracellular domains of these 
predicted transmembrane proteins were detected as 
minor components in sea urchin spicule matrix [51] and 
both were subsequently shown by knock-down experi- 
ments to play an essential role in sea urchin larval skele- 
togenesis [52]. Also among the trace components are 
proteins known to have a predominantly intracellular 
location, such as cytoskeletal components and cytosolic 
enzymes (Additional file 2: Table S2). We think that 
these proteins do not have a function in biominerali- 
zation. However, even trace components with a well- 
defined intracellular role, such as ubiquitin (now also 
known to occur in the extracellular space, however 
[53]) may have a true role in biomineralization, such as 
in the matrix of the Pinctada fucata shell prismatic 
layer [54]. Finally it should be considered that the 
number of up-regulated genes, for instance after shell 
damage [55], is usually much larger than the number 
of major proteins identified in shell matrices. Possibly 
many of the trace proteins reflect regulatory or cata- 
lytic processes involved in the mineralization event at 
some point. 

The phosphoproteome 

Because of the low number of different proteins in the 
shell matrix and because the HCD (higher energy colli- 
sional dissociation) fragmentation method used in the 
previous shell proteome analysis [17] enables phospho- 
peptide analysis at high resolution and mass accuracy in 
the LTQ Orbitrap Velos [56,57] without the need for 
neutral loss-dependent MS'^ or multistage activation [58] 
used previously with CID fragmentation, we included phos- 
phorylation as a variable modification in this re-analysis. 
The results indicated (Additional file 1: Table SI) that 
several major and a few minor proteins were phosphor- 
ylated to a variable extent. These preliminary results 
were validated by analysis of phosphopeptide-enriched 
samples of shell matrix proteins (Additional file 4: 
Table S4). Thirteen of these were confirmed by analyz- 
ing phosphopeptide-enriched fractions. Three more 
were identified only in phosphopeptide-enriched samples 
(Additional file 4: Table S4), yielding a total of 20 



phosphoproteins. The MaxQuant phosphopeptide output 
table is shown in Additional file 5: Table S5. Nine major 
proteins with a percentage of more than 1% of the iden- 
tified protein and five with a percentage between 0.1% 
and 1% (Table 1) were identified as phosphoproteins. 
Simultaneous determination of phosphorylated and non- 
phosphorylated versions of the phosphopeptides in the 
general survey without prior enrichment enabled an ap- 
proximate estimation of site occupancy (Additional file 4: 
Table S4), which was very low in most cases. Site occu- 
pancy in the group of major proteins was highest in 
GEPRP/B3A0P5 and the uncharacterized protein of 
Lotgil 1 154020. While GEPRP contained only two closely 
spaced phosphorylation sites, Lotgil 1 154020 contained 
four sites in three peptides (Additional file 4: Table S4). 
This high site-occupancy strongly indicates that phosphor- 
ylation of these proteins may be functionally important. 
Three proteins, DGLSP/B3A0P1, PLSP2/B3A0P3 and 
CCD1/B3A0Q3 yielded more than three phosphopeptides 
with variable site-occupancy (Additional file 4: Table S4). 
Of these, Coiled-coil domain-containing protein 1 (CCDl)/ 
B3A0Q3 was already shown to be extremely acidic previ- 
ously [18], a feature that is enhanced by phosphorylation. 
This may be taken as a further indication of a very import- 
ant, but as yet not understood, role of this protein in Lottia 
shell assembly. 

Taking into account the number of phosphorylation 
sites and site occupancy, CCD1/B3A0Q3 may be consid- 
ered as the major phosphoprotein of the Lottia gigantea 
shell matrix. We want to point out, however, that densely 
phosphorylated proteins with highly repetitive sequences, 
such as dentin phosphoryn, which contains almost 
exclusively aspartic acid, asparagine and phosphoserine 
[2], require special techniques to be identified and may 
be missing from our analysis. 

A search for sequences including phospho sites for 
known kinase motifs indicated that approximately one 
third (16 of 46) of the unique S/T phospho sites comply 
with the Fam20C recognition site S-x-E or related motifs 
(S/T-x-E/D/pS/pT) [3,4]. This percentage is in good agree- 
ment with the approximately 24% of human secreted 
phosphoproteins modified at the serine of the canonical 
FAM20C motif S-x-E [6]. However, much less is known 
about phosphorylation in invertebrate secreted proteins 
and the kinases involved. Therefore it is unknown whether 
these recognition sites are conserved between vertebrates 
and invertebrates. Five of the sites identified are in agree- 
ment with the typical casein kinase 2 motif S-x-x-E also 
modified in the mammalian mineralization-inhibiting 
protein osteopontin, and ten sites comply with the 
casein kinase 1 motif (D/E)n-x-x-S/T [1] indicating that 
secreted or membrane-bound kinases with casein- 
kinase-like activity are involved. Evidence for such 
kinases is summarized in [5,6]. 
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f ^ 

Figure 2 An example of different partially occupied phospho 
sites in one sequence. This peptide occurs in tlie sequence of 
DGLSP/B3AOP1/Lotgi1 1162078 (Aspartate-, glycine-, lysine- and 
serine-rich protein, aa324-335). A, peptide variant with phosphotyrosine 
identified by an uninterrupted series of y-ions for the rest of the 
sequence and the very intense diagnostic pY immonium ion at m/z 
216.042. Expert annotations [29] were onnitted, except for the major 
peak at m/z 1 20.0809 (phenylalanine immonium ion), to keep the 
spectrum clear. The doubly charged peptide ion was measured 
with a mass error of -0.014 ppm. PEP and phosphphorylation site 
localization probability were calculated by MaxQuant to be 8.96e-93 
and 0.999. B, this time S4 was determined as the phosphorylation site 
in an uninterrupted series of y-ions from y1 to y1 1. The mass error 
was -0.490 ppm, PEP was 1 .16e-54 and the localization probability was 
1 .00. Major peaks at m/z 1 20.0809 and 1 36.0756 were annotated by 
the MaxQuant Expert system as the phenylalanine immonium ion 
and the al-ion. A major peak at m/z 192.1016 was not annotated. 
Expert annotations of most of the minor peaks are omitted for clarity. 
C, a third phosphorylation site at Sg was detected with a localization 
probability of 1.00 in still another variant of this peptide measured with 
a mass error of 0.531 ppm and with a PEP of 328e-164. Again, most 
expert: annotations are omitted. *, ions showing a loss of H3PO4 from 
phosphoserine. Y-ions are shown in red, b-ions are shown in blue, b-or 
y-ions with a loss of ammonia or water are in orange, the ion i5 
shown in light blue, black identifies ions without annotation unless the 
annotation is shown on top of the peak 



Conclusions 

Our approach to proteomes of invertebrate biominerals 
consists of washing the biominerals with hypochlorite in 
a less stringent way than proposed recently [59] to pre- 
serve extra-crystalline matrix components, and to iden- 
tify as many proteins as possible after in-gel digestion of 
slices of the entire gel [17] irrespective of staining inten- 
sity, or after in-solution digestion using filter-aided sample 
preparation (FASP) [20] . Included in protein identification 
is quantitation, which was done using exponentially modi- 
fied protein abundance index (emPAI) [41] previously 
[17], but was recently superseded [60] in favor of the more 
accurate automated iBAQ method [19] as implemented in 
more recent versions of MaxQuant. We believe that this 
approach is well suited to identify candidates for func- 
tional matrix proteins, most likely found among the most 
abundant components, while retaining all of the infor- 
mation about trace components, irrespective of whether 
these may have a function in biomineralization or not, and 
irrespective of whether they are intra- crystalline or belong 
to the extra-crystalline matrix. Proteins predominandy 
located intracellularly, such as cytoskeletal components, 
ribosomal proteins, proteasome subunits or cytoplasmic 
enzymes, belong to the minor components of the Lottia 
shell proteome (Additional file 2: Table S2) constituting 
only an insignificant fraction of the total. However, the 
identification and quantitation of such proteins may also 
depend in some way on the biomineral examined, the 
instrumentation used, and the washing procedures applied 
to the shell and we agree with others [59,61] that the mere 
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presence of such proteins in the matrix sample does 
certainly not imply a function. 

The group of major proteins also contains several 
phosphoproteins. Those yielding high-occupancy phos- 
pho sites and/or many phosphorylated sequence-unique 
peptides were already identified without prior phospho- 
peptide enrichment in a general survey. However, subtle- 
ties such as the occurrence of different sites with high 
localization probability within one peptide sequence 
(Figure 2) are more likely detected with the higher copy 
numbers usually provided by phosphopeptide-enriched 
samples. Nevertheless, inclusion of phosphorylation 
among the variable modifications in general studies of low 
complexity proteomes may give an overview of what to 
expect with phosphopeptide-enriched samples and may 
provide a rough estimate of phospho site occupancies. 

Additional files 
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Additional file 1: Table SI. This table shows the complete list of 
identified proteins/protein groups including identifications that were not 
accepted following closer inspection, for instance because only one 
peptide was sequenced with insufficient sequence coverage. The table 
includes relevant parameters as, for instance, additional accession 
numbers for protein groups, scores or molecular weight of predicted 
proteins. Due to the simultaneous use of two databases and the high 
redundancy of the AIIModels database some few groups contained so 
many similar entries that the Excel program created extra cells to 
accommodate all data. This disrupted the regular pattern of lines and 
columns of the sheet However, the start of new groups is easily 
recognizable by >jgi|Lotgil followed by the accession code. 

Additional file 2: Table S2. In contrast to Table SI this table only lists 
accepted protein/protein group identifications. 

Additional file 3: Table S3. This MaxQuant output table shows all 
peptides leading to identifications in Table SI, their sequences, scores, and 
other relevant parameters. Due to the simultaneous use of two databases and 
the high redundancy of the AIIModels database some peptides appeared in 
so many similar entries that the Excel program created extra cells to 
accommodate all data. This disrupted the regular pattern of lines and columns 
of the sheet However, the start of new peptide entries is clearly recognizable 
by the peptide sequence. Peptides appear in alphabetical order 

Additional file 4: Table S4. List of identified and accepted 
phosphopeptides and phosphoproteins from the general proteomic 
survey and from analysis of phosphopeptide-enriched samples. 

Additional file 5: Table S5. This table essentially contains the 
MaxQuant Phospho(STY)Sites output file with all relevant parameters 
such as sequences, scores, and localization probabilities. 
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